.. _`Learning Curve`:

.. _`org.sysess.sympathy.machinelearning.learningcurve`:

Learning Curve
``````````````

.. image:: learning_curve.svg
   :width: 48


Generates a learning curve by training model multiple timeson incrementally larger subsets of the data and using cross validation for scoring. Plot performance of train-mean vs. test-mean for curve.


Documentation
:::::::::::::

A learning curve shows the validation and training score of an estimator for varying numbers
of training samples. It is a tool to find out how much we benefit from adding more training
data and whether the estimator suffers more from a variance error or a bias error.

A cross-validation generator splits the whole dataset k times in training and test data.
Subsets of the training set with varying sizes will be used to train the estimator and a score
for each training subset size and the test set will be computed. Afterwards, the scores will
be averaged over all k runs for each training subset size.


Definition
::::::::::


Input ports
...........

    **model**
        | Type: model
        | Description: Model
    **X**
        | Type: table
        | Description: X
    **Y**
        | Type: table
        | Description: Y

Output ports
............

    **results**
        | Type: table
        | Description: results
    **statistics**
        | Type: table
        | Description: statistics


Configuration
.............

    **Cross validation folds** (cv)
        Number of fold of cross-validation (minimum 2)
    **Shuffle** (shuffle)
        Randomizes the input dataset before passed to internal cross validation
    **Smallest fraction** (smallest)
        Size of the smallest dataset as fraction of total
    **Steps** (steps)
        Number of different sizes of training/test data measured


Implementation
..............

.. automodule:: node_metrics
    :noindex:

.. class:: LearningCurve
    :noindex: